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Abstract 

We describe a flavor tagging algorithm used in measurements of the CP violation 
parameter sin20i at the Belle experiment. Efficiencies and wrong tag fractions 
are evaluated using ffavor-specific B meson decays into hadronic and semileptonic 
modes. We achieve a total effective efficiency of 28.8 ± 0.6%. 
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1 Introduction 



In the Standard Model (SM) of elementary particles, CP violation arises from 
an irreducible complex phase in the weak interaction quark-mixing matrix 
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(CKM matrix) [1]. In particular, the SM predicts a CP- violating asymmetry 
in the time-dependent rates for and decays to a common CP eigenstate 
fcp [2]. This CP- violating asymmetry in the fcp dominated by the b — > ccs 
transition, has recently been observed by the Belle and BaBar groups [3,5]. The 
measurement at Belle is based on a sample of BB pairs collected at the T(45') 
resonance at the KEKB asymmetric-energy e+e^ collider. In the decay chain 
T(4S') B^B^ fcpftag, where one of the two B mesons decays at time tcp 
to fcp and the other, at time ttag to a final state /tag that distinguishes P° 
and the decay rates and their asymmetry are time dependent. The time 
dependence for b — > ccs transitions is given by 



g-IAtl/TgO 

Vsis{At,q,^f)^— [1 -ge/sin20isin(AmdAt)], (1) 

, nig(At,g,e/)-nig(At,-g,0) 
nig(Ai,g,e/)+nig(Ai,-g,e/)' 

where Vgig represents the normalized decay rate, tbo is the P° lifetime, ^/ is 

the CP-cigcnvaluc oi fcp, Arrid is the mass difference between the two P° mass 
eigenstates. At = tcp — ttag, the 6-flavor charge q = +1 (—1) when the tagging 
B meson is a P° (B^). The CP parameter (61 is one of the three interior angles 
of the CKM unitarity triangle, defined as (pi = n — arg(Vj^Vtd/Vj,|,Vcd). The 
CP eigenstates are reconstructed from B — > J/tpKg, ip{2S)Kg, XciKg, rjcKg, 
J/i^K*^{K*^ K°s^°), or J/ipKl decays. 

An identification of the fiavor of the accompanying B meson, called fiavor 
tagging in this article, is required to observe this kind of CP- violating asym- 
metry. A perfect tagging algorithm with a perfect detector will tag every B 
meson that decays into a flavor speciflc decay mode and will identify the flavor 
of the B meson. A practical tagging algorithm in a realistic detector will tag 
only a fraction e (the tagging efficiency) of B mesons and of those tagged, 
only a fraction of them will be identified correctly. The fraction of B mesons 
identified incorrectly is called the wrong tag fraction w. The observed time 
dependence V°^^ thus becomes 



V^^iAt, q, w, 0) = e • ((1 - w)V,,,{At, q, 0) + wV,i,{At, -q, 0)) 



Atbo 

and the observed CP- violating asymmetry A'^p, 



[1 - (1 - 2w)g{/sin20isin(AmdAt)], (3) 



_ Pf/(At, q, w, 0) - P°^/(At, -q, w, 0) 
P°^s(At, g, w, C/) + (At, -q, w, e/) 
= -(1 - 2w)g{/sin20isin(AmdAt) = (1 - 2w)Acp- (4) 
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Here, wc ignore a possible small difference between wrong tag fraction for 
q = +1 and q = —1. The observed CP- violating asymmetry A'^'p is diluted 
by (1 — 2w), which is called the dilution factor. The statistical significance of 
the asymmetry measurement is proportional to {l — 2w)^/e, i.e. the number of 
events required to observe the asymmetry for a certain statistical significance 
is inversely proportional to ecs = e(l — 2w)^, which is called the "effective 
tagging efficiency". At the same time, since the the factor (1 — 2^) sin 201 
is proportional to the amplitude of observed CP-asymmetry, the wrong tag 
fraction w directly affects the central value of sin20i. Therefore, a precise 
measurement of w is crucial in order to minimize the systematic uncertainty 
in the sin20i measurement. 

Our tagging algorithm has been developed to maximize eefr while making it 
possible to determine the value of w experimentally. The flavor tagging de- 
scribed in this paper is used not only for the sin20i measurement but also 
for other measurements such as Arrid measurements [18,19] and measurements 
of CP- violating asymmetries in the decay B — * tt+tt^ [20]. In this paper, we 
present an algorithm for flavor tagging and describe its performance. The ex- 
perimental apparatus of the Belle experiment is described in the next section. 
The flavor tagging algorithm is described in Section 3, and the measurement 
of its performance and wrong tag fractions with the control samples of self- 
tagged neutral B decays, in Section 4. 



2 Experimental Apparatus 

The Belle experiment is conducted at the KEKB energy- asymmetric e"*" (3.5 GcV) 

(8.0 GcV) collider with a crossing angle of 22 mrad. The corresponding 
center-of-mass (CMS) energy is 10.58 GeV, which is on the T(45') resonance. 
The T(4S') decays into BB pairs with a Lorentz boost of (/37)t(45) = 0.425 
nearly along the z axis, which is defined as opposite to the positron beam 
direction. The time difference between the two B meson decays is measured 
from the distance between the two B decay vertices {At = Az/^jc). 

The Belle detector [6] is a general-purpose spectrometer surrounding the in- 
teraction point. It consists of a barrel, forward and backward components. It 
is placed in such a way that the axis of the detector solenoid is parallel to 
the z axis. In this way, the Lorentz force on the low energy positron beam is 
minimized. 

Precision tracking and vertex measurements arc provided by a central drift 
chamber (CDC) [7] and a silicon vertex detector (SVD) [8]. The CDC is a 
small-cell cylindrical drift chamber with 50 layers of anode wires including 18 
layers of stereo wires. A \ow-Z gas mixture [He (50%) and C2H6 (50%)] is used 
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to minimize multiple Coulomb scattering and to ensure a good momentum res- 
olution, especially for low momentum particles. It provides three-dimensional 
trajectories of charged particles in the polar angle region 17° < 9 < 150° in 
the laboratory frame, where 9 is measured with respect to the z axis. The 
SVD consists of three layers of double-sided sihcon strip detectors arranged 
in a barrel and covers 86% of the solid angle. The three layers at radii of 3.0, 
4.5 and 6.0 cm surround the beam-pipe, a double-wall beryllium cylinder of 
2.3 cm outer radius and 1 mm thickness. The strip pitches are 84 yum for the 
measurement of z coordinate and 25 /im for the measurement of the azimuthal 
angle 0. The impact parameter resolution for reconstructed tracks is measured 
as a function of the track momentum p (measured in GeV/ c) to be axy — [19 
© 50 / {p(3 sin^^'^ 9)] fim and az = [36 © 42/{p(3sm^^'^ 9)] jj,m.. The momentum 
resolution of the combined tracking system is (Jpjpt = (0.30//3 © 0.19pt)%, 
where pt is the transverse momentum in GeV/ c. 

The identification of charged pions and kaons uses three detector systems: the 
CDC measurements of dE/dx, a set of time-of-fiight counters (T0F)[9] and 
a set of aerogel Cherenkov counters (ACC)[10]. The CDC measures energy 
loss for charged particles with a resolution of a{dE/dx) = 6.9% for minimum- 
ionizing pions. The TOF consists of 128 plastic scintillators viewed on both 
ends by fine-mesh photo-multipliers that operate stably in the 1.5 T magnetic 
field. Their time resolution is 95 ps (rms) for minimum-ionizing particles, 
providing three standard deviation (3(t) /ti^ separation below 1.0 GcV/c, 
and 2a up to 1.5 GeV/c. The ACC consists of 1188 aerogel blocks with refrac- 
tive indices between 1.01 and 1.03 depending on the polar angle. Fine- mesh 
photo-multipliers detect the Cherenkov light. The effective number of pho- 
toelectrons is approximately 6 for = 1 particles. Using this information, 
P{K/'k) = Prob{K) / {Prob{K) + Prohiii)), the probability for a particle to 
be a meson, is calculated. A selection with P{K/n) > 0.6 retains about 
90% of the charged kaons with a charged pion misidentification rate of about 
6%. 

Photons are reconstructed in a CsI(Tl) crystal calorimeter (ECL) [11] con- 
sisting of 8736 crystal blocks, 16.1 radiation lengths {Xq) thick. Their energy 
resolution is 1.8% for photons above 3 GeV. The ECL covers the same an- 
gular region as the CDC. Electron identification [12] in Belle is based on a 
combination of d£'/dxmeasurements in the CDC, the response of the ACC, 
the position and the shape of the electromagnetic shower, as well as the ratio 
of the cluster energy to the particle momentum. The electron identification ef- 
ficiency is determined from the two-photon e'^e" e+e~e+e~ processes to be 
more than 90% for p > 1.0 GeV/c. The hadron misidentification probability, 
determined using tagged pions from inclusive Kg tt+tt" decays, is below 
0.5%. 

All the detectors mentioned above are inside a super-conducting solenoid of 
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1.7 m radius that generates a 1.5 T magnetic field. The outermost spectrom- 
eter subsystem is a and muon detector (KLM)[13], which consists of 14 
layers of iron absorber (4.7 cm thick) alternating with resistive plate counters 
(RPC). The KLM system covers polar angles between 20 and 155 degrees. 
Muon identification is based on the depth of penetrated KLM layers and the 
position matching from the CDC. The overall muon identification efficiency, 
determined by using the two-photon process e^e~ e~^e~fi~^fi~ and simulated 
muons embedded in BB candidate events, is greater than 90% for tracks with 
p > 1 GeV/c detected in the CDC. The corresponding pion misidentification 
probability, determined using — > tt+tt"" decays, is less than 2%. 



3 Flavor Tagging Algorithm 

3.1 Principle of Flavor Tagging 

We determine the flavor of the accompanying B meson based on the flavor 
information of the final state particles that belong to /tag- Namely, the flavor 
of /tag can be determined from the charge (flavor) of 

(1) high-momentum leptons from B^ — > Xi'^i/ decays, 

(2) kaons, since the majority of them originate from B^ — > K'^X decays 
through the cascade transition 6 — > c — > s, 

(3) intermediate momentum leptons from b ^ c ^ si~V decays, 

(4) high momentum pions coming from B^ D^*h+X decays, 

(5) slow pions from B^ — > D*^X, D*^ — > £)°7r~ decays, and 

(6) A baryons from the cascade decay 6 — > c — > s. 

The flavor tagging algorithm cannot always determine the flavor of the B 
mesons from the final state particles; 1 e. e < 1 in general. This is caused by: 

• inefficiency in particle detection and identification, 

• fiavor- nonspecific decay processes such as £)°7r°, D° K^n'^, 

• processes that have very little information on 6-flavor such as 6 — > cud, c — > 
K^X, for which the charged particles in the final state are all pions. 

The incorrect assignment of the fiavor is mainly caused by 

• particle misidentification and 

• smaller physical processes that give a flavor estimate that is opposite to 
the dominant process, e.g. the charged kaon from the c decay in 6 — > ccs 
processes. 
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To maximize ecs, we need to maximize e, and minimize w using all available 
information for each event. As described in detail in the following subsections, 
a larger e^s is obtained by treating events with large w's and small w's sepa- 
rately. For this purpose, we use an expected event-by-event dilution factor r, 
which is described in detail in the next section. We first find a signature of 
the aforementioned flavor specific categories in each charged track and/or a 
A baryon candidate in an event. We assign r to each track/A candidate. We 
combine all such particle level r's taking their correlations into account, and 
estimate the r value for the event. We classify events into six regions according 
to their r values. The r value is determined using Monte Carlo (MC) simula- 
tion and is related to as r = 1 — 2^ if the MC simulates the data perfectly. 
For each region, we assign a wrong tag fraction w, which is measured using 
the control data sample. The values of w are used along with decay time infor- 
mation in an unbinned maximum likelihood fit to determine the asymmetry 
parameter sin20i. 



3.2 Flavor Tagging Algorithm 



We use two parameters, q and r, as the fiavor tagging outputs. The parameter 
q is the flavor of the tag-side B, as defined in Section 1. The parameter r is an 
expected flavor dilution factor that ranges from zero for no flavor information 
{w ~ 0.5) to unity for unambiguous flavor assignment {w ~ 0). In order to 
obtain a high overall effective efficiency, we must assign the best estimated 
flavor dilution factor to each event. To best accomplish this, we use multiple 
discriminants in the event. Using a mult i- dimensional look-up table binned by 
the values of the discriminants, the signed probability, g • r, is given by 

_ NjB^) - N{W) 
^ N{B^) + N{B^y ^ ' 



where N{B^) and N{B^) are the numbers of B^ and B^ in each bin of the 
look-up table prepared from a large statistics MC event sample. For example, 
consider a table with only one bin. The tagging efficiency and effective tagging 
efficiency can be written as 



eocA^o(5°) + A^o(5°), (6) 

, , [Ar,(^o) - No{W)r 
er =e\q-r\ oc — r- ' . (7) 

If we subdivide this bin into two bins, the tagging efficiencies, flavor dilution 
factors and effective tagging efficiencies can be written as 
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{q-r) 



iV,(50) + iV,(fiO)' 
e,oc7V.(S°)+iV,(5°), (9) 

Using Ei=i,2A^i(S°) = A^o(5°) and Ei=i,2Ni(W) = 7Vo(^°), the sum of the 
effective efficiencies of the two bins becomes 

E2 2 , eie2(ri - r2)^ ^2 
e^rj = er H > er . (11) 

j=l,2 ^ 



If ri 7^ r2, then the effective efficiency increases by subdividing. This can 
easily be generahzed to the case of subdivision into n bins. The increase of 
the effective tagging efficiency is proportional to the dispersion of the flavor 
dilution factor, (r — f)^ and is always positive. The number of of bins is prac- 
tically limited by the quality and quantity of the Monte Carlo simulation data 
sample. Therefore, bins with a large dispersion of r and with sufficient Monte 
Carlo statistics are subdivided. 



Figure 1 shows a schematic diagram of the flavor tagging method. The flavor 
tagging proceeds in two stages: the track stage and the event stage. In the track 
stage, each pair of oppositely charged tracks is examined to satisfy criteria for 
the A-like particle category. The remaining charged tracks are sorted into 
slow-pion-like, lepton-like and kaon-like particle categories. The 6-flavor and 
its dilution factor of each particle, {q ■ r)x, in the four categories is estimated 
using discriminants such as track momentum, angle and particle identiflcation 
information. In the second stage, the results from the flrst stage are combined 
to obtain the event-level value of g • r. 

In the Belle detector, there is a small asymmetry between particle and anti- 
particle production and detection. For example, the observed yields and signal- 
to-noise ratios of A and A candidates are different due to differences in inter- 
actions of the protons and anti- protons in the detector and differences in their 
yields in the background. In our method, A and A are treated separately and 
the effect of the small charge asymmetry is automatically taken into account. 
The A candidates have higher tagging efficiency than A candidates due to the 
larger yields of protons in the background. On the other hand, r for A's is 
lower than r for A's as the look-up table for A's contains larger backgrounds 
that are generated correctly in the MC simulation. For other tagging cate- 
gories such as lepton-, kaon- and slow-pion-like tracks, there are also small 
asymmetries, which are treated in the same way as A candidates. 

Using the MC-determined flavor dilution factor r as a measure of the tagging 
quality is a straightforward and powerful way of taking into account corre- 
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lations among various tagging discriminants. Using two stages, we keep the 
look-up tables small enough to provide sufficient MC statistics for each bin. In 
the following we provide details about each stage of the flavor tagging. Four 
million B^B^ MC events corresponding to eight million B^s are used to gener- 
ate the particle-level look-up tables. To reduce statistical fluctuations of the r 
values in the particle-level look-up tables, the r value in each bin is calculated 
by including events in nearby bins with small weights. The event-level look-up 
table is prepared using MC samples that are statistically independent of those 
used to generate the track-level tables to avoid any bias from a statistical cor- 
relation between the two stages. Seven million MC events corresponding 
to fourteen million S's are used to create the event-level look-up table. We use 
GEANT3[14] to fully simulate the detector. Two event generators, QQ[15] and 
EvtGen[16], are used to simulate the tag-side B meson decays. We used QQ- 
generated MC for early measurements of sin20i [3,4] and EvtCen-generated 
MC for more recent sin 20i measurements [17] . 



3.3 Particle-level Flavor Tagging 

For the particle-level flavor tagging, charged tracks that do not belong to 
fcp and that satisfy the impact parameter requirements \dr\ < 2 cm and 
\dz\ < 10 cm are considered. To find Kg and A candidates, we also use pairs of 
oppositely charged tracks that do not belong to fcp according to a secondary 
vertex reconstruction algorithm. Tracks that are a part of a Kg candidate or 
a A candidate are not used. However, the number of -K'^'s in the event is used 
as a discriminant in the A-like and kaon-like particle categories. 

3.3.1 Electron-like and Muon-like Track Categories 

A track is assigned to the electron-like track category if the CMS momentum 
pf^^ is larger than 0.4 GeV/c and the ratio of its electron and kaon likelihoods 
is larger than 0.8. A track is passed to the muon-like track category if the track 
has p"'^*^ larger than 0.8 GeV/c and the ratio of its muon and kaon likehhoods 
is larger than 0.95. The likehhoods are calculated by combining the ACC, 
TOF, dE/dx, and ECL or KLM information. 

The discriminants for lepton-like track categories are summarized in Table 1. 
The charge of the particle provides the 6-flavor and other discriminants 
determine its quality r. The identifier "e or /x" specifies whether a track belongs 
to the electron-like or the muon-like track categories. The lepton identification 
is optimized to reduce the kaon contamination, while a substantial fraction of 
pions is included in the muon-like track category. Prompt pions from the 
virtual W decay sometimes preserve the charge (fiavor) of the boson and 
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therefore can be used to identify the flavor of the B meson. About a half 
of such pions are included in the muon-like track categories and the rest, in 
the kaon-like track category. Such pions in the muon-like track category are 
included in the bins with low lepton ID probability and with high momentum. 

The variables pf^^ and ^^lab are used to subdivide the table as the purity and 
the efficiency of the lepton identification vary as a function of these variables. 
The pf^^ requirement is also set to accept most of the primary leptons from 
the B decays but includes some contamination due to secondary leptons from 
D decays. Since the leptons from cascade D decays give an incorrect flavor 
assignment, separation from the primary leptons in B decay is important. The 
variable p^™^ discriminates primary leptons that tend to have higher momenta. 
The variables Mrecoii and -Pm'Ss calculated using all the observed charged 
and neutral particles that do not belong to /cp- A neutrino from semileptonic 
B decay carries away more momentum than one from a semileptonic D decays. 
The Mrccoii distribution for semileptonic B decays peaks around the D mass 
and has a tail toward the lower side due to missing particles, while the one 
for semileptonic D decay distributes widely up to 5 GeV/c^ since Mrecoo is 
calculated including the decay products of the primary B meson. 

Figure 2 shows the p"'^^, Mj-ccoii and -P^'Ss distributions for the data and the 
MC. Although some disagreement is visible, the experimental bias due to 
this disagreement is found to be neghgible since w is evaluated from control 
samples as described in Section 4. 

Within the lepton categories, leptons from semileptonic B decays yield the 
highest effective efficiency while leptons from B ^ D cascade decays and high- 
momentum pions from B^ — > D^*^''n'^X make small additional contributions. 

3.3.2 Slow-pion-like Track Category 

A track that has CMS momentum below 0.25 GcV/c and is not identified 
as a kaon, is assigned to the slow-pion-like track category. The discriminant 
variables for the slow-pion-like track category arc given in Table 2. The largest 
background is from other {i.e. non-D* daughter) low momentum pions. Since 
the Q value is small, the pion from the D*^ Dn^ decay has a low mo- 
mentum and has a flight direction that follows the D* direction. We use athr, 
the angle between the direction of the slow-pion-like track and the axis of the 
thrust calculated from the tag-side particles in the CMS to select B —>■ D*~7r, p 
decays. 

The other background in this category is from electrons produced in photon 
conversions and tt^ Dalitz decays. Electrons coming from photon conversion 
are identifled through the secondary vertex reconstruction algorithm and are 
rejected. To separate slow pions from the remaining electrons, we use only 
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dE/dx because the tracks do not have enough transverse momenta {pt > 
0.3 GeV/c) to reach the ECL detector, which gives E/p and other useful 
information to discriminate electrons from pions. The 7r/e ID probability from 
dE/dx strongly depends on the laboratory momentum; the dE/dx losses are 
equal for electrons and pions around piab = 0.2GeV/c. Thus, we use piab 
instead of Pcms in this category. Figure 3 shows the distribution of cos athr and 
the momenta of the slow pion candidates in the laboratory frame, which are 
uniquely determined by the values of piab and ^lab- 

3.3.3 Kaon-like Track Category 

If a track does not fall into any of the categories described above, and is not 
positively identified as a proton, it is classified as a kaon-like track. Charged 
kaons from 6 — c — > s are included into this category. Some pions are also 
included in this category. The discriminants for this category are listed in Ta- 
ble 3. The variables p'=™'*, 6'iab and K/t: ID separate kaons from pions. For K/t: 
identification, the information from the dE/dx, TOF and ACC detectors is 
combined into a likelihood variable and a single kaon probability is calculated. 
The table is subdivided into p'^™^ and ^lab bins as the purity of kaon changes 
as a function of these variables. 

The kaon-like track category is subdivided into two parts: events with and 
without Kg decays (a switch "w/ or w/o -f^^"), since they have different 
purities; a kaon accompanied with K^''s tends to originate from a strange 
quark in a 6 ^ cc{d, s) decay or from ss popping, while one without i^^^'s has 
a higher probability to be from the cascade decay (6 — > c — >^ s). 

The bins for low kaon probability and high p^™^ contain fast pions. Fast pions 
from a prompt B decay, such as -B ^ Dii^p), have flavor information and give 
some contribution to egg. Figure 4 shows the CMS momentum distribution of 
kaon-like track candidates compared to those in MC. 

3.3.4 ^-like Particle Category 

A candidates are selected from pairs of oppositely-charged tracks one of which 
is identified as proton, and that also satisfy 1.1108 GeV/c^ < Mp^ < 1.1208 GeV/c^, 
(^defi < 30°, \Az\ < 4.0 cm and with a secondary vertex position in the r — 
plane above 0.5 cm. Figure 5 shows the Mp^, distributions for the data and 
the MC. The discriminants for this category as well as the definitions of Mp^^, 
Odefi and are hsted in Table 4. The last column is the number of bins for 
the corresponding discriminant. 

As the purity (signal-to-noise ratio) of the A candidates varies as a function 
of the variables Mp^r, Odefu and Az, we subdivide this category using these 
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variables. Since the number of A candidates is small, for each discriminant we 
subdivide into two (high and low quality) bins. The high quality bin contains 
A candidates with 1.1148 GeV/c^ < M^^ < 1.1168 GeV/c^ Odefi < 10° and 
lA^;! < 0.5 cm. 



3.4 Event-level Flavor Tagging 

The track-level (g-r)x's are combined for event-level tagging. From the lepton- 
like and slow-pion-like track categories, the track with the highest r- value from 
each category is chosen as the input to the event level look-up table. The flavor 
dilution factors of the kaon-like and A-like particle candidates are combined 
by calculating the product of the flavor dilution factors in order to account 
for the cases with multiple s quark contents in an event. The product of 
flavor dilution factors gives better effective efficiency than taking the track 
with the highest r. Table 5 shows the discriminants for event-level tagging. 
By using a three-dimensional look-up table, the correlations between flavor 
information for lepton-like, slow-pion-like, kaon-like and A-like particles are 
correctly taken into account. In the event-level look-up table, one of the bins 
in Table 5 corresponds to "empty" for the case when there is no output from 
a particular particle-level category. 

Figure 6 shows the distributions of input values for the event layer. The distri- 
bution of (q'-r)iepton has peaks around ±1 due to the high momentum primary 
leptons from semileptonic B decays. The effective efficiency for the lepton 
category is 12% according to MC simulation. The distribution of {q ■ r)^/^ 
has peaks around ±0.6, which correspond to events with a single kaon or a A 
candidate. The entries around ±1.0 correspond to events with multiple kaon 
and/or A candidates with consistent flavor information. Kaons have higher 
yields but less flavor information compared to leptons. The combined effective 
efficiency estimated for kaon and A categories is 18% according to MC. The 
distribution of (g ■ r)^^^ contains no entries beyond ±0.7, as the slow-pion-like 
track category has much more background than other categories. The effective 
efficiency for the slow-pion-like track category is estimated to be 6% according 
to MC. The peaks around zero in the three distributions are due to pions that 
have little flavor information. 

The probability that we can assign a non-zero value for r is 99.6% according 
to MC; i.e. almost all the reconstructed fcp candidates can be used to extract 
sin 201. Using a MC sample that is statistically independent of those used to 
generate the look-up tables, we estimate the effective efficiency to be 29.3 ± 
0.1%. Since the lepton-like, kaon/A-like and slow-pion-like particle tagging 
categories are not exclusive, the effective tagging efficiency is smaller than 
the sum of the efficiencies for the individual particle categories. We compare 
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the distribution oi q ■ r obtained from the control data samples with the MC 
expectation. As shown in Figure 7, the data and MC are in good agreement. 

Finally, we obtain the wrong tag fraction, w for each event from the MC- 
determined event-level flavor dilution factor, r. Trusting the Monte Carlo 
simulation completely we could assign w for each event from the relation, 
r = 1 — 2w. However, using the method described in the following section, 
we measure w using the control data samples of flavor specific decays and 
use the measured w values in the unbinned maximum likelihood fit to extract 
sin 201. All tagged events are sorted into six subsamples according to the 
value of r; < r < 0.25, 0.25 < r < 0.5, 0.5 < r < 0.625, 0.625 < r < 0.75, 
0.75 < r < 0.875 and 0.875 < r < 1. Wrong tag fractions wi are measured 
for the six regions. The average value of r for each region (ri) and measured 
wrong tag fraction (wi) should satisfy ~ 1 — 2wi, if the MC that is used for 
constructing the look-up tables simulates generic B decays correctly. Using 
the measured (and therefore average) w value for the region instead of a -u; 
value calculated for each event from MC, we introduce no systematic bias into 
the measurement of sin 20i from the Monte Carlo simulation, although the ef- 
fective tagging efficiency degrades. The degradation from the subdivision into 
r bins is estimated to be about ~ 0.5%, according to a Monte Carlo study. 
Nevertheless, in such a categorization based on the r value, we can achieve a 
higher effective tagging efficiency than using the traditional method of treating 
lepton, kaon and slow pion tags separately; by using a conventional tagging 
method with kaons and high momentum leptons, we obtain an total effective 
tagging efficiency of 22.2 ± 0.1%. 

We have also investigated the dependence of flavor tagging performance on 
MC. We prepared two sets of lookup tables, a QQ-generated table and a 
EvtCen-generated table. Comparing the QQ-generated and the EvtCen-generated 
tables, we find the EvtCen-generated table has the larger effective tagging 
efficiency. As a result, we switched to EvtGen-MC tables starting with the 
updated sin20i analysis [17]. In this section, the performance of the flavor 
tagging with EvtGen-MC is discussed and that with QQ-MC is referred to for 
comparison purposes only. The performance of the latter is described in [4] . 



4 Flavor Tagging Performance 

The flavor tagging performance is evaluated using the control samples of self- 
tagged S-meson decays, which are described in Appendix A. The flavor tag- 
ging efficiency, e is measured to be 99.8%, which is consistent with the MC ex- 
pectation. The wrong tag fraction w is obtained by fitting the time-dependent 
B^-B^ mixing oscillation signal. The analysis method is similar to the one 
used in the previous Belle B^-B^ mixing analysis [18,19]. The time evolution 
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of neutral S-meson pairs with opposite flavor (OF) or same flavor (SF) is 
given by: 



PoF(SF)(At) 



g-IAtl/r^o 



[l±(l-2w) cos(AmdAt)], 



(12) 



Atbo 



and the OF-SF asymmetry, 



VoF — VsF 



(1 -2w) cos(AmdAt). 



(13) 



mix — 



VoF + ^SF 



We obtain the wrong tag fraction using an unbinned maximum likelihood fit to 
the reconstructed At distribution of the SF and OF events with r^o and Am^^ 
fixed to the world average values [22]. The likelihood function L is defined as: 



where the index runs over all selected OF(SF) events. The function 
PoF{SF) is the sum of signal probability density function smeared by the At 
resolution and the background component, written as: 



where /sig and Fbkg are the signal fraction and the background At shape, 
respectively, whose description can be found elsewhere [18,19], and R is the 
At resolution function. A small fraction of events at large At (outliers) are 
represented by a Gaussian of a large width, Fo\. The outlier-fraction /oi, the 
width of Fo\ and the resolution function R are determined in the B lifetime 
analysis [21]. 

Figure 8 shows the measured OF-SF asymmetries as a function of At. Figure 9 
shows the measured 1 — 2w vs. r, which confirms the validity of our tagging 
method. 

The fit results are summarized in Table 6. We also evaluate the wrong tag 

fractions for fi'^-tagged events and i^'^-tagged samples separately as a check, 
since they can be different due to charge asymmetries in the particle identi- 
fication and proton fakes in A candidates. The wi values for the two samples 
are listed in Table 7. We will use separate wi for S°-tagged and S°-tagged 



L = UiPoF{Ati) X UjPsF{Atj), 



(14) 



POF(SF) 



iOF(5F) 
sig 




13 



samples as the statistics of the control samples increases. ^ 

Systematic errors are summarized in Table 8. The uncertainties in semilep- 
tonic modes are the dominant components. The systematic errors due to 
the uncertainty of the signal fraction are estimated by changing each frac- 
tion or parameter representing the fraction by ilo", repeating the fit and 
adding the deviations from the main result in quadrature. For the hadronic 
modes, we consider the effect of changing the signal region in the energy dif- 
ference = — -Ebcam by ± 10 MeV and beam-energy constrained mass 
= ^(^beam)^- {pT')^ by ± 3MeV, where E'^^' and p^"*" are the energy 
and the momentum of the reconstructed B meson in the T(45') center of mass 
system. Each parameter in the At background shape is varied by ±1ct, the fit 
is repeated and the errors are added in quadrature. We also check a possible 
difference between the At background shape in the signal region and the back- 
ground control sample using MC simulation and include it in the systematic 
error. We use MC to estimate the B — > D**iu background in the semileptonic 
sample, where D** denotes non-resonant 0*71 and heavier charmed mesons. 
We evaluate the systematic errors due to uncertainties of the branching frac- 
tions of each D** component by successively setting each D** component in 
turn to unity in the MC (with all others set to zero), and repeating the fit. 
We take the largest variation in wi as the systematic error. The effect of the 
uncertainty in the B^ — * D**~£'^i' background for the semileptonic mode is 
also included. Systematic errors originating from the vertex reconstruction 
are estimated by modifying the vertex quality and track quality selection for 
the tagging side by ±10% and by varying the B flight length assumed in the 
interaction point constraint by ±10 //m. We also check the result with differ- 
ent At ranges, ±40 ps or ±100 ps instead of the nominal range of ±70 ps. 
The rcsohition function uncertainty is obtained by modifying each parameter 
in the resolution function by ±lcr. The dependence on the B^ lifetime and 
the Arud is measured by varying the measured values by ±1(7. We test for a 
bias in a reconstruction with large statistics signal MC samples and observe 
no statistically signiflcant discrepancy, therefore no systematic error due to 
reconstruction bias is included. 

The total effective efficiency obtained by summing over the six r regions is 

eeff = E ^'(1 - '^'^if = (28.8 ± 0.6)%, 
I 

where e; is the event fraction in each of the six regions. The error includes 
both statistical and systematic contributions. 



The latest update of sin based on a data sample of 152 x 10^ BB pairs uses 
separate wi values for S°-tagged and B^ tagged samples. 
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5 Summciry 



We have developed a flavor tagging algorithm for the measurement of the CP 
violation parameter sin 201 and other measurements at the Belle experiment. 
The algorithm is designed to maximize the effective tagging efficiency, e(l — 
2wY, where e is the efficiency and w is the wrong tag fraction. We introduce 
two variables q and r, where q is the flavor charge oi a. B meson and r is the 
MC-determined event-by-event flavor dilution factor. The value r is related to 
w through r = 1 — 2w. In our approach, we independently determine w from 
control data samples, which are self-tagging events and check the relation, 
r = 1 — 2w. We therefore avoid possible biases that may be introduced by the 
use of MC. We have achieved an effective efficiency of 29. 3 ±0.1% according to 
MC simulation, which is significant improvement over the value of 22.2 ±0.1% 
achieved by the classical method of flavor tagging using only leptons and kaons. 

The flavor tagging performance is estimated from samples of decays into the 
self-tagged modes, S° ^ D*-i+u, D*~7r+, D*-p+ and D-7r+. A total of 65332 
events are used to evaluate the performance. We obtain an effective tagging 
efficiency of (28.8 ± 0.6)%, which agrees well with the MC expectation. 
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A Control Samples for Flavor Tagging 

Control samples of self-tagged 5-mesons are used to measure w's directly 
from data, and minimize the systematic uncertainty in our sin20i measure- 
ment. The control samples are also used for the evaluation of the flavor tag- 
ging performance e^s and to check flavor tagging inputs and outputs. We 
use the semileptonic decay mode, B^ — > L>*~£+i/[18] and hadronic modes 
B^ — > D^*^~7r~^, and D*~p~^[19] and their charge conjugates as control sam- 
ples. We fully reconstruct those 5-meson decays and tag the 6-flavor of the 
associated i?-mesons using the algorithm described in Section 3. Using the 
78 fb~^ data sample corresponding to 85 x 10^ BB pairs, we select 47317 
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semilcptonic decay candidates with 79.2% purity and 18015 hadronic decay 
candidates with purities of 87.9%, 84.3% and 72.6% for the D-ti+, D*~tt^ 
and D*~ p'^ modes, respectively. In the w measurement and eeff evaluation de- 
scribed in Section 4, the effect of B^-B° mixing is taken into account by fitting 
the time dependence of the flavor mixing. For the figures that show fiavor tag- 
ging input variables and outputs q-r in Section 3, wc use Xd = 0.181±0.004[22] 
to take the effect of mixing into account. In those figures, the number of en- 
tries in an event for (B^) is calculated as ^oF-{i-xd}--'^sF-Xd t^i^qj^-q ]\f 
and noF are the entries and the total number of events from the background 
subtracted control samples, respectively, and Nsf and 

nsF are those from the background subtracted D^*^'^X~ control 
samples. 
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Table 1 

Discrimiuiints for IIk^ cliX'l rou-lik(^ and muoii-likc IrMck calcgoru^s 



variable description number of bins 



charge 


track charge 


2 


e or /i 


identifier of an electron or muon 


2 


lepton ID 


lepton-ID quality value 


4 


„cms 

Pe 


the magnitude of the momentum in the CMS 


11 


^lab 


the polar angle in the laboratory frame 


6 


-^recoil 


the hadronic recoil mass 


10 


perns 
miss 


the magnitude of the missing momentum in the 
CMS 


6 


total 




31680 



Table 2 



Discriminants for the slow-pion-like track category 



variable 


description 


number of bins 


charge 


track charge 


2 




the magnitude of the momentum in the laboratory 
frame 


10 


^lab 


the polar angle in the laboratory frame 


10 


cos athr 


the cosine of the angle between the slow pion can- 
didate and the thrust axis of the tag-side particles 
in the CMS 


7 


tt/c id 


ID |")r()l)al)iliiy from dE/dx 


5 



total 7000 
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Table 3 

Discriminants for the kaon-like track category 



variable description number of bins 

charge Track charge 2 

/ / T^o switch to indicate whether the event contain „ 

w/ or w/o K% „ 2 

' ' ^ K^'s or not 

cms momentum in the center-of-mass system of r>i 

^ T(45) 

0^^^ polar angle in the laboratory system 18 

K/tt id quality value of K/n lD{dE/dx, TOF, ACC) 13 

total 19656 



Table 4 

Discriminants for the A-like particle category 



variable description 



number of bins 



flavor 



Kg presence 



7defl 



Az 



flavor of A (A or A) 

switch that indicates whether the event con- 
tains -f^^'s or not 

invariant mass of the pion and the proton 

candidate at the secondary vertex 

the angle difference between the A momen- 
tum vector and the direction of the A vertex 
point from the nominal IP 

z difference of the two tracks at the A vertex 
point 



total 



32 
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Table 5 



Discriminants for event-level tagging 



variable 


description 


number or bins 


{q • 'r')i 


{q ■ r) for the highest r from outputs of lepton cat- 
egory 


25 


{q ■ r)K/A 


y / ii uiii uiit, jji Kjyx iAkjIj kjl liJVdiiiijuLi ^ 

f]"!!!^1tS'!rMi where the subscript i 
runs over all outputs of the kaon and A categories. 


35 




g • r for the highest r from outputs of slow pion 
category 


19 


total 




16625 



Table 6 

Event fractions e/, wrong tag fractions wi, and effective tagging efficiencies e[,g = 
e;(l — 2101)"^ for each r interval. The first errors and second errors of wi are statisti- 
cal and systematic uncertainties, respectively. The errors of e^g are statistical and 
systematic combined. The event fractions are obtained from the J/ijjKg simulation. 



I 


r interval 




Wl 


^eff 


1 


0.000 


- 0.250 


0.398 


0.458 ± 0.005 ± 0.003 


0.003 ± 0.001 


2 


0.250 


- 0.500 


0.146 


0.336 ± 0.008 ± 0.004 


0.016 ±0.002 


3 


0.500 


- 0.625 


0.104 


0.228 ± 0.009 tomt 


0.031 ± 0.002 


4 


0.625 


- 0.750 


0.122 


0.160 ± 0.007 toml 


0.056 ± 0.003 


5 


0.750 


- 0.875 


0.094 


0.112 ± 0.008 ± 0.004 


0.056 ± 0.003 


6 


0.875 


- 1.000 


0.136 


n non +o.oo5 +0.005 

U.U^U _o.oo4 -0.004 


1 26 +°°°^ 



Table 7 

Wrong tag fractions wi for tagged (g = -|-1) and tagged (g = —1) events 
separately. The error is statistical only. 



I 


r interval 


Wl for g = +1 


Wl for g = — 1 


1 


0.000 


- 0.250 


0.462 ± 0.007 


0.453 ± 0.007 


2 


0.250 


- 0.500 


0.339 ± 0.011 


0.333 ± 0.011 


3 


0.500 


- 0.625 


0.211 ±0.012 


246 +° °^^ 


4 


0.625 


- 0.750 


0.148 ±0.010 


0.173 ±0.011 


5 


0.750 


- 0.875 


0.101 ±0.011 


0.122 ±0.011 


6 


0.875 


- 1.000 


020 +°-°°'^ 

U.U/:U -0.006 


0.020 ± 0.006 
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Table 8 

Summary of systematic errors 



source 


Wl 


W2 


W3 




W5 




Semileptonic signal fraction 


±0.17 


±0.26 


±0.30 


+0.26 
-0.25 


+0.31 
-0.29 


+0.19 
-0.18 


Semileptonic background shape 


±0.07 


+0.11 
-0.12 


±0.14 


±0.12 


±0.14 


+0.32 
-0.10 


Semileptonic D** composition 


+0.22 
-0.13 


+0.13 
-0.01 


+0.09 
-0.10 


+0.18 
-0.10 


+0.14 
-0.11 


+0.12 
-0.06 


Semileptonic background w 


+0.12 
-0.13 


±0.09 


±0.09 


±0.07 


+0.07 
-0.08 


±0.03 


Hadronic signal fraction 


+0.03 
-0.02 


+0.04 
-0.10 


+0.08 
-0.04 


+0.11 
-0.03 


+0.11 
-0.04 


±0.02 


Hadronic background shape 


±0.01 


±0.01 


±0.02 


±0.01 


±0.01 


< 0.01 


Hadronic background mixing 


+0.04 

-0 


< 0.01 


+0.05 

-0 


+0.03 
-0 


+0.08 

-0 


+0.04 

-0 


Vertex reconstruction 


+0.08 
-0.10 


+0.09 
-0.27 


+0.13 
-0.44 


+0.27 
-0.21 


+0.05 
-0.13 


+0.09 
-0.22 


Resolution parameters 


+0.02 
-0.01 


+0.03 
-0.02 


±0.04 


+0.04 
-0.03 


+0.05 
-0.04 


±0.06 


lifetime and Am^ 


±0.02 


+0.10 
-0.09 


±0.15 


+0.17 
-0.16 


+0.20 
-0.19 


+0.22 
-0.20 


Total 


±0.3 


±0.4 


+0.4 
-0.6 


+0.5 
-0.4 


±0.4 


+0.5 
-0.4 



Information on charged tracks 









































Track-level 
look-up tables 


Slow pion 




Lambda Kaon 




Lepton 




Select track 

with 
largest "r" 




Calculate 
combined "q.r" 




Select track 

with 
argest "r" 








(q.r)K/A 




q.r 








q.r 







Event-level look-up table 



— *- Flavor information "q" and "r" 

Fig. 1. A schematic diagram of the two-stage flavor tagging. See the text for the 
definition of the parameters "gf" and "r". 
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for r (GeV/c^) 



Fig. 2. (a) (b) P^™/, and (c) Mrecoii distributions for B° and The points 

with error bars are control sample data (See Appendix A). The solid and dotted his- 
tograms are the EvtGen-MC and QQ-MC, respectively. All distributions are made 
with a requirement on lepton ID in Table 1 to remove the large pion background. 
The upper two figures and lower two figures in (a), (b) or (c) are for £~-like tracks 
and for ^"'"-like tracks, respectively. The upper left and lower right figures in (a), 
(b) or (c) contain primary leptons from B decay, while upper right and lower left 
figures in (a), (b) or (c) contain secondary leptons from D decay. 
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Fig. 3. (a) p^'"^"^ and (b) cosathr distributions of slow pion for and . The 
points with error bars are control sample data (See Appendix A), while the solid and 
dotted histograms are the EvtGen-MC and QQ-MC, respectively. All distributions 
are made with a requirement on vr/e ID to remove low momentum electrons from 
photon conversions and vr'^ Dalitz decays. The upper two figures and lower two 
figures in (a) or (b) are for 7r~-like tracks and for vr^-like tracks, respectively. The 
upper right and lower left figures in (a) or (b) contain slow pions from D*^ decays. 




p'"''' for (GeV/c) 



Fig. 4. p'^^^ distributions of kaons for B and B . The points with error bars are 
control sample data (See Appendix A), while the solid and dotted histograms are 
the EvtGen-MC and QQ-MC, respectively. K/tt ID in table 3 is required to exclude 
the dominating pion background. The upper two figures and lower two figures are 
for Er~-like tracks and for K^ASks, tracks, respectively. The upper right and lower 
left figures contain kaons from cascade 6 — > c — > s transition. 
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Fig. 5. MpTT distributions of A candidates for and B^. The points with error bars 
are the control sample data (see Appendix A). The solid and dotted histograms are 
the EvtGen-MC and QQ-MC samples, respectively. The upper two figures and lower 
two figures are for A candidates and for A candidates, respectively. The upper left 
figure and lower right figure contain A particles from cascade 6 — > c — > s transition. 
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Fig. 6. Distributions of event layer input variables (a) (q ■ r)i, (b) {q • r)x/A and 
(c) {q • r),^^ for and B^. The points with error bars are control sample data (see 
Appendix A). The solid and dotted histograms are the EvtGen-MC and QQ-MC, 
respectively. For these distributions, the {q ■ r) values of corresponding track-layer 
outputs are obtained with the EvtGen-MC lookup table. Events with no input 
tracks that have r = are excluded. The fractions of such events are 36%, 10% and 
41% for the lepton, the kaon/A and the slow pion categories, respectively. 
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Fig. 7. The q ■ r distribution for and B^. The points with error bars are con- 
trol sample data (See Appendix A), while the solid and dotted histograms are the 
EvtGen-MC and the QQ-MC, respectively. 
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Fig. 8. Measured asymmetries between the OF events and the SF events (OF-SF 
asymmetries) for six regions of r obtained from control samples. The definition of 
OF and SF is given in the text. The background is not subtracted in the asymmetry 
plots. Solid curves show the result of the unbinned maximum likelihood fit. 
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Fig. 9. Measured 1 — 2tt; as a function of the mean value of r in each r region. The 
< r > values are taken from the J/tpK'^ MC. 
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